Using the K-Nearest Neighbor Method and SMART Weighting in the Patent Document Categorization Subtask at NTCIR-6

نویسندگان

  • Masaki Murata
  • Toshiyuki Kanamaru
  • Tamotsu Shirado
  • Hitoshi Isahara
چکیده

Patent processing is important in industry, business, and law. We participated in the classification subtask (at NTCIR-6 Patent Retrieval Task), in which, we classified patent documents into their F-terms using the knearest neighbor method. For document classification, F-term categories are both very precise and useful. We entered five systems in the classification subtask and obtained good results with them. Thus, we confirmed the effectiveness of our method. By comparing various similarity calculation methods, we confirmed that the SMART weighting scheme was the most effective method in our experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using the K Nearest Neighbor Method and BM25 in the Patent Document Categorization Subtask at NTCIR-5

Patent processing is extremely important for industry, business, and law. We participated in the F-term categorization subtask at NTCIR-5, in which, we classified patent documents into their F-terms using the k-nearest neighbor method. For document classification, F-term categories are both very precise and useful. We entered five systems in the F-term categorization subtask. They obtained the ...

متن کامل

A KNN Research Paper Classification Method Based on Shared Nearest Neighbor

The patents cover almost all the latest, the most active innovative technical information in technical fields, therefore patent classification has great application value in the patent research domain. This paper presents a KNN text categorization method based on shared nearest neighbor, effectively combining the BM25 similarity calculation method and the Neighborhood Information of samples. Th...

متن کامل

Automatic Categorization of Japanese Patents based on Surrogate Texts

This paper describes our work at the fifth NTCIR workshop on the subtask of patent classification. We use KNN (K-Nearest Neighbors) as our classifier and the English PAJ (Patent Abstract Japan) as the patent surrogate for classification. Based on the knowledge and experience learned from our previous experiments with other document collections, we leverage on the parameters to achieve above-ave...

متن کامل

Justsystem at NTCIR-5 Patent Classification

Justsystem participated in Patent Classification Subtask at the Fifth NTCIR workshop. This paper overviews our machine learning-based patent application classification system. Straightforward application of Naive Bayes classifier was effective in theme categorization subtask that has a non-hierarchical category structure. In F-term categorization subtask, we regarded the complicated F-term cate...

متن کامل

Overview of Patent Retrieval Task at NTCIR-5

In the Fifth NTCIR Workshop, we organized the Patent Retrieval Task and performed three subtasks; Document Retrieval, Passage Retrieval, and Classification. This paper describes the Document Retrieval Subtask and Passage Retrieval Subtask, both of which were intended for patent-to-patent invalidity search task. We show the evaluation results of the groups participating in those subtasks.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007